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Large expansions of a non-coding GGGGCC-repeat in the first intron of the C9orf72 gene are a common 
cause of both amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD). G-rich sequences 
have a propensity for forming highly stable quadruplex structures in both RNA and DNA termed 
G-quadruplexes. G-quadruplexes have been shown to be involved in a range of processes including telomere 
stability and RNA transcription, splicing, translation and transport. Here we show using NMR and CD 
spectroscopy that the C9orf72 hexanucleotide expansion can form a stable G- quadruplex, which has 
profound implications for disease mechanism in ALS and FTD. 



Amyotrophic lateral sclerosis (ALS) is a devastating neurodegenerative disorder in which the loss of motor 
neurons in brain and spinal cord causes progressive weakness and paralysis, ultimately leading to death 
from respiratory failure 1 . Frontotemporal dementia (FTD) is one of the most common forms of young- 
onset dementia and is characterized by the progressive degeneration of the frontal and anterior temporal lobes, 
leading to changes in personality or language impairment 2 ' 3 . ALS and FTD share numerous similarities at the 
genetic and neuropathological level, clinically they can co-occur, and they have been proposed to be part of the 
same spectrum of disease 4 . 

Large expansions of the non-coding GGGGCC-repeat in the first intron of the C9orf72 gene have been recently 
demonstrated to cause ALS and FTD 5 ' 6 . Whilst the unaffected normal control population carries <30 repeats of 
this hexanucleotide, approximately 8-10% of FTD and ALS European patients carry very large expansions which 
have been reported to range between 700 and 1,600 repeats 5 . The introns containing these large expansions are 
transcribed and indeed in patients the GGGGCC-repeat expansion is detectable by in situ hybridization in 
nuclear RNA foci 5 . 

Guanine (G)-quadruplexes are highly stable nucleic acid secondary structures formed from short tracts of G- 
rich sequence associating together. These can occur in both DNA and RNA and consist of stacks of planar layers 
of G- tetrad units, named G- quartets in which the G bases are arranged in a square cyclic pattern held together by 
eight hydrogen bonds (Fig. la). When stacked the tetrads form a central cavity formed from guanine 06 atoms, 
where they can interact with metal cations. Monovalent metal cations bound in this central channel significantly 
affect the stability (K + > Na + > Li + ) and topology of the folded G-quadruplex structure 7 . 

The presence of RNA G-quadruplexes has been demonstrated in vitro and in vivo, and they have been found in 
the transcripts of diverse organisms, ranging from viruses to humans 8 ' 9 . Computational algorithms used to 
predict G-quadruplex forming sequences suggest that there are approximately 197,000 G-quadruplex forming 
sequences in the human genome 9 . Interestingly their presence is particularly enriched in the regulatory 5' UTR, 
first intron and 3' UTR regions of transcripts 1011 . It has been shown that RNA G-quadruplexes have been 
recognized and bound by specific proteins and implicated in a wide range of biological processes including 
alternative splicing, RNA transport, translation regulation, RNA degradation and telomere stability 12 " 14 . 
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Figure 1 | Schematic representation of the parallel stranded GGGGCC 
RNA G-quadruplex. (a) Tetrads are stabilized by eight hydrogen bonds 
(black lines), with metal ions sitting centrally (orange circle) and 
phosphate backbones laterally (blue squares), (b) Four stacked G-tetrads, 
arcfr-glycosidic torsion angles, with phosphate backbones connected 
through a propeller loop arrangement, comprised of the two cytosines, 
ensures a parallel topology. 

The GGGGCC expansion of C9orf72 forms runs of adjacent G- 
repeats and raises the possibility these could form G-quadruplexes. 
The secondary structure of the hexanucleotide repeat is very likely to 
be involved in determining the proteins it interacts with. These inter- 
actions may play a critical role in disease - as has been shown in 
Myotonic Dystrophy, where similar non-coding RNA repeat expan- 
sions form nuclear foci which sequester key RNA-binding proteins 
thereby causing functional defects 15 . 
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Thus, given the importance of expanded non-coding repeats 
in other RNA repeat expansion diseases 15 , we have investigated 
the secondary structure of the C9orf72 hexanucleotide repeat 
by biophysical methods to determine whether it forms RNA 
G-quadruplexes. 

Results 

In silico analysis predicts C9orf72 GGGGCC-repeats form G- 
quadruplexes. The GGGGCC-expansion lies in the 5' region of 
C9orf72 intron 1 (Fig. 2b). Control individuals are reported to 
have <30 repeats, but the majority of controls have 2 repeats 5,61617 . 
Nonetheless, using the EuQuad database 18 , which identifies G- 
quadruplex forming sequences in eukaryotic genomes, the three 
GGGGCC-repeats and adjacent GGGGC nucleotides present in 
the human reference C9orf72 gene sequence, are recognized as 
a G-quadruplex forming sequence. We therefore termed this 
sequence the minimal C9orf72 G-quadruplex repeat unit (C9Gru). 
The prediction is supported by the G-quadruplex analysis tool QGRS 
Mapper 9 : the full C9orp2 genomic sequence containing the C9Gru 
sequence shows one predicted four stacked G-quadruplex with a 
high predictive score (Fig. 2a). When inputting a pathogenic 
sequence containing 800 hexamer (GGGGCC) repeats the number 
of predicted 4 stacked G-quadruplexes increases accordingly 
(Fig. 2b). 

GGGGCC-repeats form RNA G-quadruplexes. ID 1H NMR is a 

key technique in providing unequivocal experimental evidence of G- 
tetrad and quadruplex formation. We used NMR to investigate the 
structure of an RNA oligonucleotide consisting of the C9Gru 
sequence. NMR analysis in 10 mM potassium phosphate buffer, 
pH 7.0, prior to refolding in the presence of quadruplex stabilizing 
cations, revealed the presence of imino proton peaks between 12 to 
14 ppm, a region of the spectrum characteristic of Watson-Crick (W- 
C) hydrogen bonding (Supplementary Fig. SI). Raising the 
potassium chloride concentration to 40 mM and annealing the 
RNA allowed the C9Gru oligonucleotide to be refolded in the 
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Figure 2 | The C9orf72 hexanucleotide repeat is predicted to form a G-quadruplex structure. The G-quadruplex prediction tool GQRS mapper was 
used to identify potential G-quadruplex forming sequences in the entire C9orf72 genomic DNA sequence, which is shown with exons highlighted in red 
and the GGGGCC repeats in green. GQRS mapper provides a G-score (plotted in blue) which indicates the likelihood of G-quadruplex formation, (a) The 
highest G-score in the C9orf72 reference sequence from Ensembl GRCh37 corresponds to the three GGGGCC repeats and adjacent GGGGC which we 
have termed the minimal C9orf72 G-quadruplex repeat unit (C9Gru). (b) 800 GGGGCC repeats were added, which is representative of the disease- 
causing expansions, leading to an extended region of high G-score. Image is adapted from QGRS mapper (http://bioinformatics.ramapo.edu/QGRS/ 
index.php). Ela = exon la; Elb = exon lb. 
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presence of stabilizing cations. At room temperature the ID 1H 
NMR spectrum showed the presence of a broad envelope of peaks 
between 10 and 1 1.5 ppm, and several distinct peaks characteristic of 
G-tetrad formation (Fig. 3). The appearance of peaks consistent with 
tetrad formations coincides with the loss of W-C peaks between 12 to 
13 ppm. The buffered C9Gru RNA oligonucleotide was then heated, 
stepwise ( + 5 K) from 273 K to 333 K, but without a significant 
change observed in the spectra, indicating the stability of the 
folded quadruplex and retention of the G-tetrads. As clear and 
distinct peaks could not be differentiated within the imino region, 
further structural analysis was not pursued. 

GGGGCC G-quadruplexes are highly stable and parallel oriented. 

G-quadruplexes can adopt three possible topologies according to the 
directions of the strands, these can be parallel, anti-parallel or mixed. 
Riboguanosine has a strong propensity to adopt the 'anti 
conformation and therefore to give rise to parallel G- quadruplex 
structures. Circular dichroism (CD) spectroscopy is a standard 
method used to analyse structural features of G-quadruplexes 19,20 . 
We used this technique to characterize the structural conformation 
of the C9Gru RNA sequence in a 40 mM KC1 buffer. This showed a 
positive peak at 262 nm and a negative peak at 237 nm (Fig. 4a), 
which are the hallmarks of a parallel-oriented G-quadruplex 
structure 19 . 

To study the stability of the G-quadruplex motif found in the 
C9Gru G-quadruplex structure, we performed melting experiments 
by increasing the temperature from 15°Cto95°C using a l°C/min 
gradient. The CD spectra were measured with 10°C intervals and 
showed a strong overlay from 15°C to 75°C with a reduction of the 
262 nm peak only starting at 85°C, indicating a strong stability of the 
formed structure (Fig. 4a). Folding was fully reversible, but lagged 
behind the unfolding indicating either a slow refolding process or 
intermolecular quadruplex formation (Fig. 4b). 

To gain information on whether the G-quadruplexes are formed 
intermolecularly or intramolecularly, we analysed the melting profile 
temperature by diluting the C9Gru RNA oligonucleotide 10-fold 
(0.46 uM) in buffer containing 40 mM KC1. The spectrum displayed 
the same characteristic maximum and minimum around 262 and 
237 nm, respectively. Furthermore, the structure remained highly 
stable and retained the same amount of structure at 95°C as seen 
in the concentrated sample (Supplementary Fig. S2). This result 
indicates the formation of an intramolecular G-quadruplex 19 . 

GGGGCC G-quadruplex stability is cation dependent. The 

stability of the G-quadruplex structure is strongly influenced by 
the presence of monovalent cations between the G- quartet stacks. 



K + promotes stable folding over Na + and Li + ions. We therefore 
compared the C9Gru RNA structure in the presence of either K + 
or Li + . CD results showed a reduction of the formed structure in 
the presence of Li + , with a reduction of the 262 nm peak by —30%, 
and a reduction in stability of the quadruplex and the cooperativity of 
the fold (Fig. 4c). 

Together, these CD data show that GGGGCC RNA repeats fold 
into a very stable, parallel intramolecular G-quadruplex structure. 

Discussion 

Our investigations show the C9orf72 GGGGCC-hexanucleotide 
repeat forms G-quadruplexes and these adopt a parallel topology 
as illustrated in the model in Figure lb. Our structural data confirm 
in silico predictions that attribute a high probability of forming G- 
quadruplexes to the GGGGCC -repeat. 

With regards to quadruplex topology, in contrast to DNA G-quad- 
ruplexes, RNA G-quadruplexes generally disfavour folded topologies 
with phosphate backbones running antiparallel containing mixed 
syn and anti glycosidic bonds. A preference of the ribose sugar puck- 
ers for the C3'-endo conformation favours the an ft- conformation 
and so an all-parallel topology 21 . Structural studies have suggested 
that stability is derived through the 2' -OH groups and their intra- 
and inter molecular interactions with the ribose 04' atom for C3'- 
endo pucker and the N2 guanine amine for C2'-endo puckers 22 . 
However, this preference is not absolute and examples of RNA 
adopting the an ft'- conformation exist 23 . Our results show that similar 
to most other RNA G-quadruplexes, the C9orf72 GGGGCC-repeat 
G-quadruplex adopts a parallel conformation. The GGGGCC-repeat 
G-quadruplexes, similar to other RNA G-quadruplexes, are very 
thermodynamically stable at physiological intracellular potassium 
ion concentrations. Our CD data obtained from the RNA oligomer 
dilution also suggests that quadruplexes form intramolecularly. In 
the context of the C9orf72 expansion, where the hexanucleotide 
repeats are >700, it remains to be assessed whether quadruplexes 
form by association of adjacent repeats or from distant tracts of the 
sequence. 

The ALS-FTD causing expansions in C9orf72 are formed by a 
very large number of repeats, which is thought to range between 
700 and 1,400 hexamers 5 . Although we have shown the propensity 
of the basic repeats composing this expansion to form G-quad- 
ruplexes, the extended structure of this very long RNA sequence is 
not yet known. Evidence of ternary structure from RNA tran- 
scripts of telomeric repeat (TERRA) sequences containing several 
hundred G-quadruplex- forming hexamers, indicates a "bead on a 
string model" with stacked quadruplex dimers, using 3' or 5' 
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Figure 3 | NMR analysis of the C9orf72 GGGGCC RNA hexanucleotide repeat shows formation of G-quadruplexes. The ID proton spectrum of the 
C9Gru RNA oligonucleotide annealed in 10 mM K 2 P0 4 40 mM KC1 buffer, pH 7.0, 298 K. Peaks in the imino proton region (arrow) between 10 and 
11.5 ppm correspond to quadruplex formation. 
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Figure 4 | CD analysis shows the GGGGCC RNA G-quadruplex structures are very stable, cation dependent and parallel oriented. CD spectra of the 
C9Gru RNA oligonucleotide (4.6 um) show a positive peak at 262 nm and a negative peak at 237 nm, which is characteristic of parallel- oriented G- 
quadruplex structures, (a) and (b) represent the temperature unfold and refold spectra respectively, with the RNA oligonucleotide in KC1 40 mM, K 2 P0 4 
10 mM buffer. The peak at 262 nm only decreases at 85°C indicating a very stable structure. Temperature unfold (c) and refold (d) spectra of identical 
RNA in LiCl 50 mM, Na 2 P0 4 10 mM buffer. The characteristic cation dependence of G-quadruplex structures was confirmed by the observed reduced 
stability in the presence of Li + ions. 



interfaces, connected by the intervening linkers to the next pair- 
ing 24 ' 25 . Therefore, we would predict that many quadruplexes 
would be formed within the C9orf72 expansion, and that quad- 
ruplexes would stack as dimers. 

Three transcript variants (VI, V2, V3) have been described for the 
C9orf72 gene: V2 and V3 utilize exon la and therefore include the 
hexanucleotide repeat, while VI utilizes the alternative exon lb 
therefore excluding the hexanucleotide repeat, which is located 
upstream of the transcription start site 5 (Fig. 2a). Real time RT- 
PCR data from patient brain and cell lines has shown that the 
presence of the expanded repeat causes a reduction in VI transcrip- 
tion 5 ' 16 . V2 and V3 were shown to be transcribed in patient frontal 
cortex cDNA, which was confirmed by the detection of the hexanu- 
cleotide repeats in nuclear foci in patients' brains by in situ hybrid- 
ization 5 . A second study used real-time PCR to show that V2 levels 
were reduced in patient frontal cortex 16 . Therefore further studies are 
required to clarify the extent of transcription of the expanded repeat 
in V2 and V3 in relevant patient tissues. As GGGGCC RNA repeats 
form G-quadruplexes, it is likely that the DNA sequence also forms 
these structures. There is evidence that DNA G-quadruplexes on the 
template strand (also known as the non- coding strand) can inhibit 
transcription 26 . It has also been suggested that G-quadruplexes form- 
ing on the coding or non-template strand, as is the case with the 
C9orf72 repeats, could enhance transcription by keeping the tem- 
plate strand single stranded 26 . This would be consistent with tran- 
scription through the repeat and the identification of nuclear RNA 
foci containing the repeat in patient tissue 5 . Further work will be 



required to determine the extent and mechanism of transcription 
of the C9orf72 repeat sequence. Whether disease is caused by a toxic 
effect of the GGGGCC- repeat RNA, by a loss of function of C9orf72 
or by both mechanisms is yet to be determined. 

Repeat expansions in the 3 ' UTR of DMPK and in intron 1 of 
ZNF9 were shown to cause Myotonic Dystrophy type 1 and 2 
respectively (DM1 and DM2) 27 ' 28 . These mutations are similar to 
C9orf72 expansions in being located in non-coding regions of the 
respective genes and in forming nuclear RNA foci in patient tissue 
and cells. Although loss of transcript has been suggested to play a 
role in both DM1 and DM2 29 ' 30 , the fact that two repeat sequences 
located in entirely different genes can cause such similar disease 
features, implies a potential common pathogenic mechanism by 
RNA gain- of- function 31 . Indeed, it has been shown that RNA 
binding proteins, such as MBNL1, are sequestered away from their 
normal RNA targets by interaction with the expanded repeats 32 . 
This leads to aberrant functional downstream effects which 
directly cause aspects of the myotonic dystrophy phenotype 33-35 . 
The expanded repeats that cause DM1 fold into a stable hairpin 
structure that mimics the normal MBNL1 RNA-binding site 36,37 . 
These findings suggest it is possible that the expanded GGGGCC - 
repeat may act in a similar fashion. In this light, the secondary 
structure of the RNA repeat expansion is crucial in determining 
which proteins it binds to. 

G-quadruplexes have been shown to play a role in a variety of 
biological processes 1314 . A number of these functions are carried 
out by the interaction with G-quadruplex binding proteins and the 



SCIENTIFIC REPORTS | 2 : 1016 | DOI: 1 0.1 038/srep01 01 6 



4 



sequestration of these proteins could lead to downstream effects that 
play a role in the disease pathogenesis. While many DNA G-quad- 
ruplex binding proteins have been identified, relatively few proteins 
have been confirmed to bind RNA G-quadruplexes 38 . A prominent 
example of an RNA G-quadruplex binding protein is the fragile X 
mental retardation protein FMRP 39 , but it is currently unknown 
whether FMRP is sequestered by the C9orf72 expanded GGGGCC- 
repeats. Further work will be required to determine whether specific 
proteins bind to the repeats and what role they might play in disease 
pathogenesis. If RNA G-quadruplex binding proteins were seques- 
tered by the expanded repeats, thousands of potential target genes 
with predicted G-quadruplex structures in their regulatory regions 
could be affected 40 . 

An intriguing finding, which carries relevance for neurons, is 
that G-quadruplexes in the 3' UTR of mRNAs are necessary and 
sufficient for the localization of these mRNAs to dendrites, 
through interactions with proteins such as FMRP 12 . Whether the 
accumulation of the C9orf72 expanded RNA causes sequestration 
of proteins relevant in mRNA transport needs to be addressed, 
however, this highlights the possibility of RNA alterations in 
C9orf72 ALS/FTD that are not limited to RNA quantity and splic- 
ing defects. Therefore the study of C9orf72 ALS/FTD may need to 
couple quantitative RNA sequencing studies with RNA local- 
ization investigations in order to fully understand disease patho- 
genesis. Finally, small molecules have been identified that interact 
with G-quadruplexes 13 and our data suggests that they may have 
use as potential therapeutics in ALS and FTD caused by C9orf72 
repeat expansions. 

Methods 

RNA oligonucleotide sample preparation and annealing. An HPLC purified RNA 
oligonucleotide of sequence GGGGCCGGGGCCGGGGCCGGGGC was purchased 
from Integrated DNA Technologies and supplied as a lyophilized powder and 
reconstituted in ultrapure water to a stock concentration of 2.5 mM. We termed this 
sequence the C9orj72 minimal G-quadruplex repeat unit (C9Gru). Annealing was 
carried out by heating the samples to 90°C and allowing them to cool overnight to 
20°C. 

NMR spectroscopy. NMR data was acquired using a Bruker AVANCE 500 NMR 
spectrometer operating at a proton resonance frequency of 500.13 MHz and 
equipped with a QNP cryoprobe. ID X H NMR spectra of the RNA sample in 
90%H 2 O/10%D 2 O were acquired and processed using Topspin (version 2.1, Bruker 
Biospin, Karlsruhe) and excitation sculpting pulse sequence (Bruker pulse program 
zgesp). Spectra of the RNA sample were acquired before annealing in the presence of 
10 mM potassium phosphate buffer, pH 7.0, and after annealing in the presence of 
40 mM of KC1, and 10 mM potassium phosphate buffer, pH 7.0. Samples were 
equilibrated in both cases at a calibrated temperature of 298 K. Data was accumulated 
centered at the solvent resonance with 1024 transients over a frequency width of 
10.33 kHz into 32 K data points for an acquisition time aq = 1.58 s with a relaxation 
delay dl = 2 s between transients using a 90° radiofrequency (r.f.) pulse (pi = 13 us 
at a power level of 2 dB). 

Circular dichroism. RNA concentrations ranged from 4.6 to 0.46 uM and are 
indicated in the results section. The following buffers were used: 40 mM KC1, 10 mM 
potassium phosphate, pH 7.0; 50 mM LiCl, 10 mM sodium phosphate, 0.35 mM 
KC1, pH 7.0. CD experiments were performed at temperatures between 15°C and 
95°C, with a l°C/min temperature gradient, using a Jasco J715 spectropolarimeter 
(Jasco Hachioji, Tokyo, Japan) equipped with a Jasco peltier temperature control 
system. A CD spectrum of the buffer was recorded and subtracted from the spectrum 
obtained for the RNA-containing solution. Data were zero -corrected between 
340-350 nm. 
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